Representations and metrics

Question 1

Screenshot taken from Coursera

Answer


In [13]:
def calculate_weight(feature):
    weight = (1/(max(feature) - min(feature))) ** 2
    return weight

price = calculate_weight(np.array([500000, 350000, 600000, 400000],  dtype=float))
room = calculate_weight(np.array([3, 2, 4, 2],  dtype=float))
lot = calculate_weight(np.array([1840, 1600, 2000, 1900],  dtype=float))

print price
print room
print lot


1.6e-11
0.25
6.25e-06

Question 2

Screenshot taken from Coursera

Answer

  • Word counts
    • Sentence 1: [2, 1, 1, 1, 1, 1, 1, 1, 0]
    • Sentence 2: [0, 2, 1, 1, 0, 1, 0, 1, 2]
  • Euclidean distance:

In [16]:
import numpy as np

s1 = np.array([2, 1, 1, 1, 1, 1, 1, 1, 0, 0], dtype=float)
s2 = np.array([0, 2, 1, 1, 0, 0, 0, 1, 2, 1], dtype=float)
print s1
print s2

euclidean_distance = np.sqrt(np.sum((s1 - s2)**2))
euclidean_distance


[ 2.  1.  1.  1.  1.  1.  1.  1.  0.  0.]
[ 0.  2.  1.  1.  0.  0.  0.  1.  2.  1.]
Out[16]:
3.6055512754639891

Question 3

Screenshot taken from Coursera


In [17]:
import numpy as np

s1 = np.array([2, 1, 1, 1, 1, 1, 1, 1, 0, 0], dtype=float)
s2 = np.array([0, 2, 1, 1, 0, 0, 0, 1, 2, 1], dtype=float)
print s1
print s2
cosine_similarity = np.dot(s1, s2)/(np.sqrt(np.sum(s1**2)) * np.sqrt(np.sum(s2**2)))
cosine_distance = 1 - cosine_similarity
cosine_distance


[ 2.  1.  1.  1.  1.  1.  1.  1.  0.  0.]
[ 0.  2.  1.  1.  0.  0.  0.  1.  2.  1.]
Out[17]:
0.5648058601107554

Question 4

Screenshot taken from Coursera

Question 5

Screenshot taken from Coursera

Answer

  • Given the number of documents, $tf*idf$ = 0, so idf = 0
$$idf = \large log \frac{\text{# docs}}{\text{1 + # docs}} = 0$$$$\large \frac{\text{# docs}}{\text{1 + # docs}} = e^0 = 1$$

Question 6

Screenshot taken from Coursera